Maximum a posteriori pruning on decision trees and its application to bootstrap BUMPing

نویسندگان

  • Jinseog Kim
  • Yongdai Kim
چکیده

The cost-complexity pruning generates nested subtrees and selects the best one. However, its computational cost is large since it uses holdout sample or cross-validation. On the other hand, the pruning algorithms based on posterior calculations such as BIC (MDL) and MEP are faster, but they sometimes produce too big or small trees to yield poor generalization errors. In this paper, we propose an alternative pruning procedure which combines the ideas of the cost-complexity pruning and posterior calculation. The proposed algorithm uses only training samples, so that its computational cost is almost same as the other posterior-based algorithms, and at the same time yields similar accuracies as the cost-complexity pruning. Moreover it can be used for comparing non-nested trees, which is necessary for the BUMPing procedure. The empirical results show that the proposed algorithm performs similarly as the cost-complexity pruning in standard situations and works better for BUMPing. © 2004 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regression tree construction by bootstrap: Model search for DRG-systems applied to Austrian health-data

BACKGROUND DRG-systems are used to allocate resources fairly to hospitals based on their performance. Statistically, this allocation is based on simple rules that can be modeled with regression trees. However, the resulting models often have to be adjusted manually to be medically reasonable and ethical. METHODS Despite the possibility of manual, performance degenerating adaptations of the or...

متن کامل

Learning with data adaptive features

The cost-complexity pruning generates nested subtrees and selects the best one. However, its computational cost is large since it uses hold-out sample or crossvalidation. On the other hand, the pruning algorithms based on posterior calculations such as BIC (MDL) and MEP are faster, but they sometimes produce too big or small trees to yield poor generalization errors. In this paper, we propose a...

متن کامل

The Effect of Fruit Trees Pruning Waste Biochar on some Soil Biological Properties under Rhizobox Conditions

The pyrolysis of fruit trees Pruning waste to be converted to biochar with microbial inoculation is a strategy improving the biological properties in calcareous soils. In order to investigate the biochar effect on some soil biological properties of the soil in the presence of microorganisms, a factorial experiment was carried out in a completely randomized design in the rhizobox under greenhous...

متن کامل

Effects of Pruning on Haloxylon aphyllum L. Dimensions and its Application in Biological Reclamation of Desert Regions in Yazd Province

 Knowledge of the Saxaul dimensions used in sand dunes stabilization is considered essential for designing live windbreak in desert regions. This research aimed to collect and analysis data and was performed on the pruned and control shrubs of Haloxylon aphyllum L. in Yazd province, Iran in the last two decades. Our review clearly showed the superiority of shrubs pruned at the height of 35 cm i...

متن کامل

Majority-rule reduced consensus trees and their use in bootstrapping.

Bootstrap analyses are usually summarized with majority-rule component consensus trees. This consensus method is based on replicated components and, like all component consensus methods, it is insensitive to other kinds of agreement between trees. Recently developed reduced consensus methods can be used to summarize much additional agreement on hypothesised phylogenetic relationships among mult...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2006